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1. Introduction 

Cyber-attacks can maliciously disable machines, steal data, or use an infected machine as a 
point for other attacks. Distributed Denial of Service (DDoS) attacks are a type of attack that 
send fake requests to a machine and flood it by overloading the system. This leads to the 
machine crashing or being rendered unusable. Such attacks can cost firms large sums of money 


since they lose money for every second their server malfunction. 


For example, the first few months of 2022 saw an unexpected increase in the number and 
duration of DDoS attacks, predominantly due to Russia’s invasion of Ukraine’. Although not 
all DDoS attacks have political implications, they are a powerful tool for cyber warfare. This 


type of cybercrime has become increasingly common and can be used to attain nefarious goals. 


A DDoS (distributed denial-of-service) attack is one of the most dangerous attacks in which 
the attacker aims to make a resource or server unavailable to its intended users. There are 
multiple ways to perform this, such as queuing requests, creating unterminated sessions, 
overloading packet sizes, etc. The attack is so dangerous because these requests are initiated 
by compromised devices, making it harder to distinguish between genuine and malicious 
requests — an IP or device rate limit does not prevent it. A recent example includes Cloudflare 
— one of the largest content delivery networks responsible for delivering over 7,000,000 
websites — which was the target of one of the most significant DDoS attacks in history. Matters 


like this make it pressing to find ways to detect and mitigate DDoS attacks efficiently. 


HTTP flooding is a common form of a DDoS attack. The compromised systems make 


continuous requests to a web server, using up its resources and preventing users from accessing 


! Hacken. “How to Detect a DDoS Attack? - 5 Red Flags - Hacken.” Hacken, 8 Aug. 2022, 
hacken.io/discover/how-to-detect-a-ddos-attack/. Accessed 13 Aug. 2022. 
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them". In addition, the requests are usually sent to endpoints that require many resources to 
process — such as querying and processing large amounts of data from a database — to increase 
each request's overall impact. These attacks are necessary to identify, as they consume large 
amounts of bandwidth and computing power and deny access to them to genuine users. Hence, 


it is necessary to identify them automatically. 


A DDoS request is hard to identify — no definite factors can be used to define it. While they 
target endpoints with a high packet size and processing time, it is hard to identify whether an 
individual request is compromised or not. Instead, they must be identified from the pattern of 


the incoming requests while considering various factors. 


Continuous monitoring is a popular tool used for detecting DDoS attacks, since it can be used 
to automatically deploy safety measures and alert the IT team when there is an anomaly in the 
requests. While it may speed up the mitigation process, it also requires more manual labor and 
may be ineffective if monitored too strictly. However, since the patterns of each client can be 
analyzed to know whether they are malicious, neural networks, due to their pattern recognition 


ability, pose as a potent tool for detecting such attacks because they can detect patterns. 


This paper seeks to investigate further the extent to which a trained feed forward neural network 


can detect an HTTP flood DDoS, specifically upon receiving live data when used as a proxy. 


1.1 Worthiness 


The vulnerability and impact of a DDoS increases as the number of web applications increases 
hourly. From small businesses that have just launched their application to large-scale 
companies, this research could be fruitful. By correctly classifying the type of a request and 
2 “What Is a Distributed Denial-of-Service (DDoS) Attack?” Cloudflare, 2023, 
www.cloudflare.com/learning/ddos/what-is-a-ddos-attack/. Accessed 11 July 2022. 


3 Hacken. “How to Detect a DDoS Attack? - 5 Red Flags - Hacken.” Hacken, 8 Aug. 2022, 
hacken.io/discover/how-to-detect-a-ddos-attack/. Accessed 13 Aug. 2022. 
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recognizing a pattern, a neural network can be used as a proxy to intercept every request, verify 
it, and only forward legitimate ones. This could save firms thousands of dollars and small 
businesses from such attacks, too — those that can cause them to shut down. Moreover, the data 
and analysis from this research could be extended to be applied to different DDoS attacks — 


SYN floods, UDP floods, and ICMP floods. 


This investigation also aims to record and compare the performance and timings of the neural 
network against the time taken to fulfill the request to understand the impact on the neural 


network on the server. 


1.2 Scope 


DDoS datasets, particularly for HTTP floods, usually contain sensitive data about the users and 
the server in use — which are often part of the requests — and hence are often found on malicious 
platforms that have to be accessed using the TOR network. This can be unethical to use for 
such an investigation. Thus, to conduct this experiment, a dataset will be generated by 


simulating a DDoS attack. 


DDoS attacks include thousands of devices, so, they are very costly and complicated to 
simulate and require extensive hardware access. Thus, an experiment simulating a small-scale 
DDoS will be conducted to answer the posed research question and achieve the paper’s aim. 
Firstly, large data sets will be generated consisting of DDoS emulations, including data about 
the HTTP requests (user IP, endpoint, time taken, packet size, etc. After the data sets are 


generated, an ANN will be trained to recognize DDoS patterns. 


Since the performance and results of a neural network are heavily dependent on its 
hyperparameters used, the hyperparameters used in this investigation will be selected by 


analyzing and understanding the preprocessed data. Moreover, to further measure the extent to 
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which a neural network can detect the attacks, the model will be trained with a different number 


of hidden layers. The results of these different models with then be compared and evaluated. 


2. Background Information 


2.1 Machine Learning and its Types 


Pattern recognition is based on machine learning, which is the study of programming 
computers to do tasks they haven’t been directly programmed to do". These networks are then 
trained to recognize, analyze, and learn from data and perform complex tasks (i.e., identifying 
patterns and predicting events). They can be trained via multiple strategies — unsupervised, 
supervised, reinforcement, and more — and each has advantages and disadvantages. Supervised 
learning allows a network to identify and create mappings between the features and data 


classification”. 


There are several types of machine learning algorithms as well. Still, pattern recognition 
requires classification, which involves the computer learning the relations between the data and 
their labels. So, for example, a machine could be given a collection of texts grouped by their 
language, and the classification network would analyze the features of those texts to attempt to 
relate specific visual characteristics to certain languages. This is what is referred to as 
“training.” If successful, the network would eventually be able to accurately predict the 
language of texts it has not seen before by identifying relations between the features of the text 
to a language, which it made during training. Pattern recognition algorithms are trained 
similarly with thousands of labeled entries. Since the main goal of this investigation is to 
classify network requests, and because labeled data is present, supervised training will be used 


for this research. 


4 Ng, Andrew. “Supervised Machine Learning: Regression and Classification.” Coursera, 2022, 
www.coursera.org/learn/machine-learning. Accessed 13 Aug. 2022. 

5 Salian, Isha. “NVIDIA Blog: Supervised vs. Unsupervised Learning.” The Official NVIDIA Blog, 2 Aug. 2018, 
blogs.nvidia.com/blog/2018/08/02/supervised-unsupervised-learning/. Accessed 28 Aug. 2022. 
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2.2 Chosen Machine Learning Model 


DDoS attacks can be detected using many different types of machine learning models. Feed 
forward neural networks, support vector machines (SVM), and random forest are some of the 


most common and popular methods for detecting them". 


Feed forward networks are generally applicable to most sorts of pattern detection scenarios. 
Since I have briefly worked with these networks, I will be further studying them and using a 
feed forward network for the sake of this investigation. The inner workings of a feed forward 


network are discussed below. 


2.3 Feed Forward Neural Networks 


Feed forward neural networks consist of layers of neurons: the input layer, the output layer, 
and the hidden layers. Each layer identifies certain patterns within the data. Inspired from how 
brains function, neural networks are made up of neurons. The purpose of an artificial network 
is to receive inputs, perform calculations, and give an output — and pass it onto the next layer 
of neurons. The input is represented as the first layer of neurons, and they continue to activate 


consecutive layers until the output layer is activated, which represents the output itself’. 


Helden Layers 


Input Layer 


Output Layer 


Figure 1: Structure of a feed forward neural network (self-made) 


6 Aytac, Tugba, et al. “Detection DDOS Attacks Using Machine Learning Methods.” Electrica, vol. 20, no. 2, 15 
June 2020, pp. 159-167, https://doi.org/10.5152/electrica.2020.20049. Accessed 7 Sept. 2022. 

7 Sanderson, Grant. “Neural Networks - YouTube.” YouTube, 2019, 

www. youtube.com/playlist?list=PLZHQObOWTQDNU6R1_67000Dx_ZCJB-3pi. Accessed 9 Oct. 2022. 
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To receive and pass data, neurons rely on links, referred to as weights. The value of the neuron, 
which it passes forward, is called its activation®. All neurons in layers after the input layer rely 
on the previous layer for input, which is simply the activation of all the neurons in the layer 
before multiplied by the weights of the neuron with each of the neurons. Moreover, each neuron 
also has a bias, which indicates how much the neuron is activated in general (a negative bias 
means the neuron usually has low activation, whereas a positive bias means the neuron usually 
has a high activation). The activation for the nth neuron in the layer j with N neurons, which is 


preceded by the layer i is given by: 


N 
Es ? 
Aj, =o ` (w; x Ai) + Bj, 
p=1 


Equation 1: Equation for the activation of a neuron in layer i, given consecutive layers j and P 


Here, A;„ represents the activation of the nth neuron in layer i, Drai (w? x A, represents 
n 


the sum of the products of the weights and activations of the neurons in the layer before, and 


bj a is the bias of the neuron. These biases and weights are assigned randomly using a seed 


when a network is created and are changed as the model trains itself. o represents the activation 
function, which is a mathematical function that gives an output from an input and is used for 


performance and efficiency reasons!”. 


The “output”, or result, or the network are the neuron activations in the output layer. Each 
neuron represents a unique answer, and its activation represents the probability that it is 
correct!!. The answer with the highest probability is assumed to be the correct one. While 


training, the network compares the output with the real answer, using which a “cost” is 


8 Sanderson, Grant. “Neural Networks - YouTube.” YouTube, 2019, 

www. youtube.com/playlist?list=PLZHQObOWTQDNU6R1_67000Dx_ZCIJB-3pi. Accessed 9 Oct. 2022. 
3 IBID 

10 Sanderson, Grant. “Neural Networks - YouTube.” YouTube, 2019, 

www. youtube.com/playlist?list=PLZHQObOWTQDNU6R1_67000Dx_ZCJB-3pi. Accessed 9 Oct. 2022. 
1 IBID 


Personal Code: jgc887 


determined. The cost represents the error of the network (the lower, the better) and can be 
calculated using a range of different methods, such as mean square error, mean absolute error, 
and root mean square error P, Moreover, because the network is essentially a mathematical 
function, the cost can also be represented by a function. The error function can then be 
minimized to increase the accuracy of the network, effectively training the network. This is 
done by calculating the gradient of the cost function at the current weights and biases and 
descending downwards (to minimize it) by tweaking them. As the cost is further minimized 


during training, the network becomes better at predicting the answer, given a set of features’. 


Hence, neural networks can be looked at as a function which gives a certain number of outputs 


given specific inputs and parameters ". 


3. Experiment Methodology 


Primary experimental data sets are the main sources of data for this paper. An experimental 
methodology vvas chosen because there vvas limited secondary data to conclude this paper, and 


this method provides freedom to train and test the model with primary data. 


3.1 Generation of Data Sets 


The data sets have been generated by “simulating” a DDoS. This was done by writing two 
programs — a client and a server — responsible for mocking a server and a user (compromised 
or genuine) (refer to appendix | for code). The clients were deployed in 35 different virtual 
private servers over ten different geographical locations worldwide, as seen in figure 2 below. 


They were deployed using Linode and DigitalOcean (computing providers). 


12 Saini, Hrithik. “7 Types of Cost Functions in Machine Learning | Analytics Steps.” Www.analyticssteps.com, 
www.analyticssteps.com/blogs/7-types-cost-functions-machine-learning. Accessed 18 Aug. 2022. 

13 IBID 

14 IBID 
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Figure 2: Geographical locations of clients used in experiment" 
The client is structured as shown in figure 3.1 below, exposing two endpoints — compromised 
and user — giving the researcher control over the client. The same client structure is replicated 
on all 35 VPSs worldwide to act as a distributed system. While there were multiple clients, the 


server was kept constant. 


compromised requests (random genuine requests (random 
endpoint, as fast as possible) endpoint, 3-5 seconds) 


HTTP POST 
compromised/ or user/ 


El 


Figure 3.1: Client-side structure and logic 


15 Self-made using Google My Maps, mymaps.google.com. Accessed 19 Sept. 2022. 
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S 


Figure 3.2: Server-side structure”? 
The server is designed to keep the logical layer oblivious to the logging process, allowing the 
codebase to remain maintainable and helping maintain strict conditions for the ANN to be 


trained with Oe, acting as middleware or firewall). 


The method below was used to conduct the simulation and generate the datasets: 

1) 30 out of the 35 devices initially act as “real” clients, making requests to random 
endpoints every 3-5 seconds. This means five clients are to act only as compromised 
devices, not making real requests. 

2) 5-6 minutes later (randomly chosen), 15 out of the 35 (including the five inactive ones 
mentioned in point 1) devices start simulating “compromised” clients by making 
requests to random endpoints to their maximum load. This emulates the situation where 
the attacker starts the DDoS, and the compromised devices flood the server with 
requests. During this, the other 20 devices continue to act as “real” users, as would in 
the real world. 

3) After 5-7 minutes (randomly chosen), the malicious clients stop the DDoS and are 
terminated by terminating the ongoing request from the researcher’s machine to the 
client on its compromised endpoint. 


4) The rest of the clients continue to make requests as normal users 


16 Figure 2.1 and 2.2 are self-made using excalidraw. “Excalidraw.” Excalidraw, excalidraw.com/. 
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5) The data logged earlier is then transferred to the researcher’s local machine in a CSV 


format by using the SCP protocol 


6) The process is repeated four more times, generating a total of 5 large datasets, each 


emulating a unique DDoS request pattern 


Since each client knows if it is acting as a malicious or real user, the request is labelled as 


compromised or not when it is made itself. 


Each experiment iteration was conducted automatically, using python 3.9, which controlled the 


clients and their states (refer to appendix 1 for code). Table 1 showcases a few data points of 


the data collected as a sample. 


ip 


endpoint 


headers 


time 


packet_size 


time_taken 


compromised 


139.144.44.193 


Host: 165.232.182.157 
User-Agent: python- 
requests/2.28.1 
Accept-Encoding: gzip, 
deflate 
Accept: x/x 
Connection: keep-alive 
Content-Type: 
application/json 


1674598718 


100 


0.057705225 


FALSE 


170.187.139.144 


/endpoint-2 


Host: 165.232.182.157 
User-Agent: python- 
requests/2.28.1 
Accept-Encoding: gzip, 
deflate 
Accept: x/x 
Connection: keep-alive 
Content-Type: 
application/json 


1674598721 


359 


0.051176134 


FALSE 


157.245.104.1 


/endpoint-4 


Host: 165.232.182.157 
User-Agent: python- 
requests/2.28.1 
Accept-Encoding: gzip, 
deflate 
Accept: x/x 
Connection: keep-alive 
Content-Type: 
application/json 


1674599070 


5347 


0.161323308 


TRUE 


Table 1: Sample raw data. A screenshot of the entire data can be seen in Appendix 37 


17 The descriptions and units for the fields can be seen in Table 2.1 
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The packet size and time taken of the requests can be plotted against the time they were 
received to visualize the DDoS simulation. Since there are a very large number of requests, a 


random sample of 0.2% of the requests is plotted below. 
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Time taken for request (seconds) 


Time since simulation started (seconds) Time since simulation started (seconds) 


Figure 4.1: Packet Size of Each Request Figure 4.2: Time Taken for Each Request" 


The compromised requests have a higher time taken per request and packet size. When looking 
at the pattern as a whole, an attack can be recognized. However, it is much harder to recognize 


whether an individual request is compromised just by looking at the information above. 


3.2 Processing the datasets for use 


The datasets generated must be preprocessed before they can be fed into the neural network as 
parameters. Feature extraction is an essential process, since it allows more specific information 
from the data to be found, which allows the network to find more intricate patterns. Moreover, 
by eliminating variables that are not needed, the number of features is greatly reduced, which 


reduces the time for the network to learn and generalize’. 


Features such as average packet size and request time for the past 5 requests and requests in 
the past 5 seconds by each client were calculated for each request. Moreover, data such as the 


IP address and headers were removed as they do not contain information that can be used to 


'8 The diagrams are self-made using matplotlib and python 3.9 

19 Chatterjee, Sampriti. “What Is Feature Extraction? Feature Extraction in Image Processing.” Great Learning 
Blog: Free Resources What Matters to Shape Your Career!, 29 Oct. 2021, 
www.mygreatlearning.com/blog/feature-extraction-in-image-processing. Accessed 13 Oct. 2022. 
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recognize a DDoS. The five processed data sets were merged and then shuffled. Lastly, the 
datasets were split into training and evaluation data, with a 4:1 split ratio? — 80% of the data 
was used for training, while 20% was used for evaluation). The preprocessing was done using 


python 3.9 and the scikit-learn, pandas, and numpy libraries, as seen in the figure below. 


Using scikit-learn to split processed data into testing and training data Using pandas 
and numpy to 


load csv files and 


preprocess data 


Figure 5: Python library imports used for preprocessing 
Table 2.1 showcases the fields of the raw data, which was collected during the simulation, and 
Table 2.2 showcases the fields of the processed data, which will be used as input for the neural 


network. 


mm "nem 
E The IP address of the client which made the request 


I The time at which the request was received by the server, in seconds 
time 
since epoch 


The size of the response packet sent by the server, in bytes 
The time taken for the server to fulfill the request 
The nature of the request: whether it was compromised or not 


Table 2.1: Fields for raw data 


2 Tokuç, A. Aylin. “Splitting a Dataset into Train and Test Sets | Baeldung on Computer Science.” 
Www.baeldung.com, 14 Jan. 2021, www.baeldung.com/cs/train-test-datasets-ratio. Accessed 21 Oct. 2022. 
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Description 


The size of the response packet sent by the server, in bytes 


The number of past requests received from the same client in the 


past 5 seconds 


Ise ` 
DEE The average packet size of the responses sent to the same client in 
average. packet. size 
Smaken ` ` 


the past 5 requests 
The time taken for the server to fulfill the request in seconds 
The average time taken for the server to fulfil the request of the 


same client in the past 5 requests, in seconds 


The nature of the request: whether it was compromised or not 


Table 2.2: Fields for processed data 
Since the processed data has aggregative and accumulative fields, each request contains 


information about some previous requests. Hence, the processed data is effectively able to 
describe the raw data. Furthermore, more relationships can be found within the data, such as 


the average time taken and packet size of each request. 


sr, a di 


D 
o 
o 
o 
o 


8 
o 
o 
1 
Packet size for requests from client (bytes) 


Average packet size for requests from client (bytes) 


T y T y 
02 0.4 06 0.8 . 0.2 0.4 0.6 0.8 
Average time taken for requests from client (seconds) Average time taken for requests from client (seconds) 


Figure 6.1: Average packet size vs average time taken Figure 6.2: Packet size vs average time taken”! 


A relationship between the time taken and packet size for requests is visually evident, as seen 
in figure 6.1 and 6.2. The compromised and genuine requests form “clusters”, which can also 


be used by unsupervised models??. Overall, the processed data is effectively able to resemble 


21 The diagrams are self-made using matplotlib and python 3.9 
22 Mishra, Sanatan. “Unsupervised Learning and Data Clustering.” Medium, Towards Data Science, 19 May 2017, 
towardsdatascience.com/unsupervised-learning-and-data-clustering-eeecb78b422a. Accessed 14 Oct. 2022. 
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and enhance the features of the raw data, which would benefit the network in effectively 


classifying the requests. 


3.3 Dependent Variables 


The variable being measured in this paper is the classification success rate of the ANN — the 
percentage of requests it correctly classifies as malicious or real — and the time taken to classify 


the request. 


Time 
The time measured in this case is the time taken for the network to classify a request instead of 
the time taken to train the model. Python’s time.time() method was used to calculate the 


time before and after the call to the network. 


Accuracy 
The accuracy measured was the accuracy of the network in classifying requests correctly with 
evaluation data sets. The accuracy of the network is the number of correct classifications 


divided by the total number of classifications. 


3.4 Programming of the Feed Forward Neural Network 


The feed forward network was programmed using python and TensorFlow. It took 5 inputs, as 
seen in Table 2, omitting the label, and had one output neuron, whether the request was 


compromised. The structure of the network is illustrated below. 


14 
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5 Ineuts i 


8 Hidden Neurons 


n layers 


Figure 7: The structure of the programmed feed forward neural network, where n is the number of layers? 
The dimensions of the input layer are equal to the features of the dataset. As discussed before, 
the model would be trained with a different number of layers. The dense input layer will be 
processed by these hidden layers, and the output layer flattens the activation of the hidden 


layers into a single neuron. 


The models were trained using TensorFlow’s Sequential.train() function. Since the 
data has a large number of features, the model becomes more prone to descending to a local 
minimum instead of the global minimum. Hence, the model was trained with a low batch size 


of 8, which allows it to generalize the pattern better and have higher accuracy”. 


3.5 Experimental Procedure 


4 networks were configured with an input layer with 5 neurons, and an output layer with 1 
neuron. Each network was programmed to have a different number of hidden layers — 1, 2, 3, 
and 4 respectively. They were trained upon the same training data, and their performance and 


accuracy were then recorded against the testing data. 


23 Self-made using excalidraw. “Excalidraw.” Excalidraw, excalidraw.com/. 
2 Keskar, Nitish Shirish, et al. “On Large-Batch Training for Deep Learning: Generalization Gap and Sharp 
Minima.” ArXiv:1609.04836 [Cs, Math], 9 Feb. 2017, arxiv.org/abs/1609.04836. 
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Additionally, neural networks are initialized with randomized weights and biases, so their 
training results may have outliers and inconsistencies. While a low batch size was already 
chosen to minimize this error, 2 models were also programmed for each configuration to ensure 
that the patterns observed were not pertaining to the randomized initial state. The models were 
completely identical, except their seed, which were “12345” and “98765”, and were nicknamed 


“Model 1” and “Model 2”. Both seeds were chosen arbitrarily. 


3.6 Hypothesis 


I hypothesize that because neural networks specialize in detecting patterns the network will be 
able to successfully detect a DDoS attack with high accuracy. Moreover, the accuracy would 
increase as the number of hidden layers are increased, as the network would be able to find 


more specific patterns. 


4. The Experimental Results 


4.1 Accuracy over epochs 


Model Accuracy vs Epochs (1 layer) Model Accuracy vs Epochs (2 layers) 


6 
Epoch Number Epoch Number 


Figure 8.1: Accuracy vs Epoche with 1 layer Figure 6.2: Accuracy vs Epochs with 2 layers 
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Model Accuracy vs Epochs (3 layers) Model Accuracy vs Epochs (4 layers) 


6 8 
Epoch Number Epoch Number 


Figure 8.3: Accuracy vs Epochs with 3 layers Figure 8.4: Accuracy vs Epochs with 4 layers?” 
The accuracy measured while training was the accuracy against the training data itself. The 
models were trained for 16 epochs, however as seen in the figures above, their accuracy 


remained similar after the first two epochs. 


4.2 Accuracy on testing data 


Accuracy for models with different layers 
100.000 


MH Model 1 
P Model 2 


Accuracy (96) 


2 3 
Number of Hidden Layers 


Figure 9: Accuracy of models on testing data for different layered networks” 


The networks had astonishingly high accuracies, of over 99.98%. Both, model | and model 2 


had a higher accuracy with two hidden layers, and the accuracy gradually reduced as more 


25 Figures 8.1, 8.2, 8.3, and 8.4 were self-made using python and matplotlib 
26 Self-made using python and matplotlib 
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layers were introduced. This is against my hypothesis, as I predicted that the accuracy would 


continue to rise. 


4.3 Time taken for classification 


Average time taken for models with different layers 
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2 3 
Number of Hidden Layers 


Figure 10: Average time taken to classify request vs number of hidden layers” 


The time taken to classify a request increased for every hidden layer added. The networks took 
from 2 to 3.5 milliseconds to classify a request, which accounts for the TensorFlow function 


call overhead. 


5. Analysis 


5.1 Analyzing Accuracy 
As seen by the 99.99% accuracy with 2 hidden layers, the neural network is highly successful 


in classifying requests, and hence, can help solve the problem of mitigating HTTP flood DDoS 
attacks. While this makes it a potent tool in mitigating DDoS attacks, its high accuracy can be 
accredited to factors that may make the network less effective in the real world. The dataset 
created was only a simulation of areal DDoS attack. This meant that the data had a significantly 


lower number of individual clients. Hence, the patterns of the requests were also inherently 


27 Self-made using python and matplotlib 
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limited. This may cause the data collected to be too narrow, which as a result can cause the 


network to adjust only for the given dataset, and hence have high accuracy. 


5.2 Analyzing Performance 


In terms of performance, as discussed before, the network is highly efficient and performant. 
A request is usually classified in a few milliseconds, which is extremely low when compared 
to the average time taken by an HTTP request: 500ms2. Hence, the network would have a 


negligible impact when implemented as a middleware for servers. 


5.3 Making sense of the drop in accuracy 


As was seen in figure 9 above, the network’s accuracy reduced when it was trained upon 3 or 
4 hidden layers, which was against my proposed hypothesis. The larger the number of layers, 
the larger the number of trainable parameters for a network that influence its output. This means 
that Model 1 with 2 layers is not the same mathematical function as Model | with 4 layers. 


Although the function with 4 layers is more complicated, it does not guarantee better results. 


Each layer picks up on the layer before, and finds patterns in that, meaning, as more layers are 
added, more and more details are picked up which influence the output. However, when too 
many layers are added, the network overanalyzes patterns and starts considering “noise” — 
meaningless data". This leads to the model overfitting itself on the training data, and in turn, 


leading to higher inaccuracy”. 


28 Saunders, Orde. “How Long Does an HTTP Request Take? | Blog | Decade City.” Decadecity.net, 12 Mar. 
2014, decadecity.net/blog/2012/09/15/how-long-does-an-http-request-take. Accessed 28 Nov. 2022. 

2 “What Is Noise in ML.” Iguazio, www.iguazio.com/glossary/noise-in-ml. Accessed 17 Dec. 2022. 

30 “What Is Overfitting? - Overfitting - AWS.” Amazon Web Services, Inc., aws.amazon.com/what-is/overfitting/. 
Accessed 28 Dec. 2022. 


19 


Personal Code: jgc887 


5.4 Computational Costs 


However, there are computational costs that come along the implementation of a network. 
Firstly, a machine with extensive resources, especially RAM and CPU, is required to train a 
neural network. With larger datasets, the training process can take days, if not weeks, and 
hence, the availability of system resources is essential. Furthermore, since the batch size is 


small, the training time for the model is further increased. 


Moreover, when using the network as a middleware, there are further implications for the 
resources. As the network must be loaded into RAM, and the CPU would be used while 
classifying the requests, the network might utilize resources that the server could otherwise 


use. 


6. Conclusion 

This experiment sought to identify the effectiveness and extent to which a feed forward neural 
network can identify and mitigate HTTP flood DDoS attacks. The feed forward network setup 
by me is able to successfully identify DDoS attacks to mitigate them, while being highly 
performant and having a minimal impact on the request timings. This is in line with and 


validates my hypothesis. 


6. Further Research Opportunities 


6.1 Investigating a change in the preprocessing of the data 


The feature extraction for this experiment contained several average and aggregate values. For 
example, the average packet size was calculated from the past 5 requests. It is intriguing to find 
the impact on the accuracy of the network when the preprocessing of the data is changed, such 


as changing the calculations to account for 10 requests, and when more features are added. 
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6.2 Utilizing different machine learning models 


As found after preprocessing the data, visual patterns were visible, which can be utilized by 
different models. Hence, different models can also be used to detect DDoS attacks, and since 
they differ in the way they recognize patterns, it is compelling to investigate how well other 
machine learning models can detect DDoS attacks. Moreover, as mentioned before, other 


models, such as SVM and random forest could also be explored. 


6.3 Extending to different DDoS attacks 


This investigation only examined how HTTP flood attacks can be mitigated and did not look 
at different types of DDoS attacks such as SYN and UDP floods. These attacks are executed 
on different network layers, and hence have different input parameters. Hence, it would be 


interesting to see how neural networks can be used to detect these types of attacks. 
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Appendix 
1: Code for simulation of DDoS 
The following code was used to provision multiple virtual private servers around the globe, 


deploy the client code on them, and then perform the DDoS simulation. It also has the code for 


the host server, which received the requests and stored them in files. 


Client that acts as a malicious or compromised user: 


import threading 
from dotenv import load_dotenv 


O 


import os 

import random 

import time 

from time import 

from flask import Flask, request 
import requests 


app (name ) 


user_endpoints ['/', '/endpoint-1', '/endpoint-2', '/endpoint-3'] 
malicious endpoints ['/endpoint-3', '/endpoint-4', '/endpoint-5'] 


def (endpoint): 
return f"http://{os. ('SERVER_IP_ADDRESS' )Hendpointy" 


(endpoint, data): 
(endpoint ) 
(f"Making request to {url}") 
requests. (url, json=data) 


user_thread: threading.Thread = None 
user_running = False 


malicious thread: threading.Thread 
malicious running False 


def (timetorun): 
global user running 
start - time. () 
while start + timetorun > time. (Oye 
if not user_running: 
break 
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make request to endpoint (r 


}) 
sleep ( 


def run_malicious( 
global 
= time.time() 
while + 
r IONE 
break 
make_request_to_endpoint(r 


P 


(a .route( 
def normal(): 
global > 
= request.args 
= .get(" 
HEI <= ¡LS 


return 


if 


return 


.start() 
return 


(a .route( 
def stop_user(): 
global 


if 
del 
return 
else: 
return 
@ .route( 
def malicous(): 


global 


= request.args 


=run_user, 
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if 


return 


if 


return 
=run_malicious, 
.start() 
return 
.route( 


stop_malicious(): 
global 


if 


del 


return 
else: 
return 


.route( 
index(): 
return 
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(a .before request 
def log request info(): 
if request.path not in .Keys(): 
return 


[request.path] 


request.get_json()[ 
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.put([request.remote_addr, request.path, 
request.headers, n S 1) 
if len( 


3 


.route( e 
hello world(): 
return 


.route( 
hello world_1(): 
return 


.route( 
hello world_2(): 
return 


.route( ] 
hello world_3(): 
return 


.route( 
hello world_4(): 
return 


.route( 
hello world_5(): 
return 


save_logs(): 
global 
while 
-now() + 


-now() < 


. append ( 
except: 
continue 


= pd.DataFr . from records( 
if len( es 
= pd.concat([ 
CO CSN 
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if 
import 
= Í o Wl =save_logs, 
.start() 
.run( 


import 


import sleep 
import dotenv 
import linode as lin 


n as 
import 


import 


def debug( 
print(f me.now().strftime( 


def delete(): 
= le api.get linodes ram() 
print(f If len( H 
Tor in : 
.delete() 


1.delete_all() 


def create(): 
= .get_regions() 
print(f len( ) 


lin get linodes() 
print(f IF len( ) 


if len( 
debug ( 
return 
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.make_linode( 
.getenv( )}) 
.append([ gle 1) 


.get_linodes() 


def run_concurrent( 


dag 
= [e. submit ( o MEINE) Tole in ] 
i .as completed( 


.result() 


.append( ) 
except E as e: 
print (Ff ) 


return 


(1): 
.append(( [i % len( 


run_concurrent ( it -create, 
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printf: 


if len( 
debug(f 
return 


(30): 
. append( 


(30, 35): 
. append( 


debug(f 
debug(f 


for 


debug(f 


sleep( 


debug(f 


.remove(ip) 
.append(ip) 


.get(f 
.get(f 


.append(i) 
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debug (f 
sleep ( 


debug ( 


for 


debug (f 


: delete, 
: create, 
GUN 


def main(): 
.load_dotenv() 
if len( ES 
print(f .keys())}") 
return 


[1] 
wp not in 
print(f KEYS jam) 
return 


Bash Code for Deploying Client Code on Server: 


if [ -f /etc/apt/sources.list ]; then 
apt 
apt 
apt 
apt-get 
else 
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if [ ! -d /root/ddos ]; then 


echo 
systemctl 


systemctl 
systemctl 


2: Code for pre-processing data, training network, and running tests 


The code below was used to pre-process the data collected before, and train the model. It was 


running TensorFlow in Visual Studio Code, on a Lenovo Legion laptop. 


as 
as mti 
import train_test_split 
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def get_model( 
import 
from 


.add( 


.add( 
.compile( 


al 


return 


def get_processed_data( 15 
= pd.read_csv(f 


def process ip(ip, 


].sum() 
].sum() 


] 


return 


H 
[ ].unique(): 
.append(process_ip(ip, 


l.concat( 


return 


def preprocess(): 
fora int (5) 
Pl Fi = get processed data(i) 
.to csv(f ) 


(4): 


= pd.read csv(f 
. append( ) 


= pd.concat( H 
«to _csv( 


= .pop( 


Personal Code: jgc887 


. sample ( 


A 


CO CSV 
.to csv(' 
.to_csv( 
.to_csv( 


def train_model( 
= pd.read_csv( 
= .read_csv( 
get_model( 
= a bei 
with open(f 
1. dump ( 


.save(f 


def train( 
import Pell as 1 


= train_test_split( 


ES COV, F) 


Le, set random seed( 


ser ah Tal I (ls 5): 
print() 
print() 
print(f 
train model(i) 
print() 
print() 


def evaluate _model(n=2): 
import as 
{} 


l.read_csv(" 


l.read_csv(" 


[ 
len( 


) 
.load model(f 
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[i]).numpy()[@][®] 


round((i/ Jr 


.append( 


(1-len( 
print(f 


return 


def evaluate(): 
Sëll 


Pf OSc .exists( 
with open( ) as 
= .load(f) 
else: 
for in range(1, 5): 
print() 
print() 
print() 
print() 
print(f' 
print() 
with open(f"n 
= m.load(f) 


= evaluate model(i) 


.append({ 


with open( 
. dump ( j 


.figure( 


= 
with open( 


in 
in 


(len( 
ange(len( 


= .gca() 
.set_ylim([ > 1) 
.Xaxis.set major locator( 


-+ /2 for 
ct for 


bart 5 
bart 3 


.xticks([r + 


[al sk al open 


in 


ED) 


TOR 
(len( 


(len(res))], 


wa li 


.title( 
.legend([f 
.Xlabel( 
.ylabel( 
. Savefig(f 


.cla() 


[ill 
[ill 


= -gca() 
.xaxis.set_major_locator(mti 

(2 ROA a, Ta d 
TOn 
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))] 
))] 


nge(len( 
(len( 


))] 
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bart ='Model 17) 
bart ='Model 27) 


vtickstlr + range(len(res))], 
[i + 1 range(len(res))]) 


.title("Average time taken for models with different layers") 
.LXlabel( 'Number of Hidden Layers') 

.ylabel('Average time taken to classify request (milliseconds) ' ) 
.legend([f"Model 1", f"Model 2"], ="upper left") 
.savefig(f'diagrams/time_taken.png' ) 


range(4): 
plt.cla() 
plt.plot( [i]["history"]) 
plt.plot( [i][ history i) 


plt.title(f"Model Accuracy vs Epochs ({i+1} layer{'s' 


plt.ylabel("Accuracy") 

plt.xlabel("Epoch Number") 

plt.gca().set_ylim([@.9993, 1]) 

plt.legend([f"Model 1", f"Model 2"], ="lower right") 
plt.savefig(f"diagrams/model_{i+1}.png") 


plt.cla() 
tensorflow E 
= tf.keras.models.load_model("models/model_2.tf") 


evaluate() 
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3: Screenshot of raw data 
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ip endpoint packet_size headers 

139.144.4/ 100 Host: 165.232.182.157User-Agent: python-requests/2.28.: 
170.187.1 /endpoint 359 Host: 165.232.182.157User-Agent: python-requests/2.28.: 
139.162.1 /endpoint 871 Host: 165.232.182.157User-Agent: python-requests/2.28.: 
172.105.9/ 56 Host: 165.232.182.157User-Agent: python-requests/2.28.: 
170.187.1 /endpoint 470 Host: 165.232.182.157User-Agent: python-requests/2.28.: 
139.144.4 /endpoint 348 Host: 165.232.182.157User-Agent: python-requests/2.28.: 
139.144.4 /endpoint 885 Host: 165.232.182.157User-Agent: python-requests/2.28.: 
66.175.22/ 90 Host: 165.232.182.157User-Agent: python-requests/2.28.: 
139.162.1 /endpoint 971 Host: 165.232.182.157User-Agent: python-requests/2.28.: 
172.105.9/ 90 Host: 165.232.182.157User-Agent: python-requests/2.28.: 
172.105.1 /endpoint 253 Host: 165.232.182.157User-Agent: python-requests/2.28.: 
139.144.4 /endpoint 446 Host: 165.232.182.157User-Agent: python-requests/2.28.: 
209.97.18/ 79 Host: 165.232.182.157User-Agent: python-requests/2.28.: 
170.187.1 /endpoint 790 Host: 165.232.182.157User-Agent: python-requests/2.28.: 
139.59.77 /endpoint 476 Host: 165.232.182.157User-Agent: python-requests/2.28.: 
139.144.4 /endpoint 435 Host: 165.232.182.157User-Agent: python-requests/2.28.: 
66.175.22 /endpoint 422 Host: 165.232.182.157User-Agent: python-requests/2.28.: 
139.162.1 /endpoint 387 Host: 165.232.182.157User-Agent: python-requests/2.28.: 
209.97.13 /endpoint 261 Host: 165.232.182.157User-Agent: python-requests/2.28.: 
172.105.1 /endpoint 335 Host: 165.232.182.157User-Agent: python-requests/2.28.: 
172.105.9 /endpoint 256 Host: 165.232.182.157User-Agent: python-requests/2.28.: 
209.97.18/ 33 Host: 165.232.182.157User-Agent: python-requests/2.28.: 
170.187.1 /endpoint 417 Host: 165.232.182.157User-Agent: python-requests/2.28.: 
139.144.4 /endpoint 610 Host: 165.232.182.157User-Agent: python-requests/2.28.: 
139.59.77/ 49 Host: 165.232.182.157User-Agent: python-requests/2.28.: 
66.175.22/ 59 Host: 165.232.182.157User-Agent: python-requests/2.28.: 
139.144.4/ 47 Host: 165.232.182.157User-Agent: python-requests/2.28.: 
139.162.1 /endpoint 363 Host: 165.232.182.157User-Agent: python-requests/2.28.: 
209.97.13/ 60 Host: 165.232.182.157User-Agent: python-requests/2.28.: 
172.105.1 /endpoint 545 Host: 165.232.182.157User-Agent: python-requests/2.28.: 
209.97.18 /endpoint 300 Host: 165.232.182.157User-Agent: python-requests/2.28.: 
170.187.1 /endpoint 312 Host: 165.232.182.157User-Agent: python-requests/2.28.: 
139.59.77 /endpoint 263 Host: 165.232.182.157User-Agent: python-requests/2.28.: 
172.105.9/ 88 Host: 165.232.182.157User-Agent: python-requests/2.28.: 
139.144.4 /endpoint 941 Host: 165.232.182.157User-Agent: python-requests/2.28.: 
66.175.22 /endpoint 305 Host: 165.232.182.157User-Agent: python-requests/2.28.: 
139.144.4 /endpoint 275 Host: 165.232.182.157User-Agent: python-requests/2.28.: 
172.105.1 /endpoint 272 Host: 165.232.182.157User-Agent: python-requests/2.28.: 
139.162.1/ 30 Host: 165.232.182.157User-Agent: python-requests/2.28.: 
209.97.13 /endpoint 476 Host: 165.232.182.157User-Agent: python-requests/2.28.: 


4: Screencast of model evaluation 


1674598718 
1674598719 
1674598721 
1674598722 
1674598723 
1674598723 
1674598723 
1674598725 
1674598725 
1674598726 
1674598727 
1674598727 
1674598728 
1674598728 
1674598728 
1674598729 
1674598729 
1674598730 
1674598730 
1674598730 
1674598732 
1674598732 
1674598732 
1674598733 
1674598733 
1674598734 
1674598734 
1674598734 
1674598734 
1674598734 
1674598736 
1674598737 
1674598737 
1674598737 
1674598737 
1674598737 
1674598738 
1674598738 
1674598739 
1674598739 


0.05771 
0.09539 
0.05118 
0.07691 
0.08613 
0.14932 
0.10248 
0.05419 
0.05789 
0.03681 
0.06568 
0.13238 
0.07384 
0.12131 
0.11953 
0.08191 
0.11246 
0.06358 
0.11112 
0.07229 

0.0804 
0.03928 
0.15398 
0.10205 
0.06756 
0.07142 
0.04411 
0.06323 
0.03206 

0.1496 
0.14174 
0.12996 
0.08202 
0.09897 
0.13083 
0.16102 
0.08437 
0.05136 
0.07184 
0.13159 


FALSE 
FALSE 
FALSE 
FALSE 
FALSE 
FALSE 
FALSE 
FALSE 
FALSE 
FALSE 
FALSE 
FALSE 
FALSE 
FALSE 
FALSE 
FALSE 
FALSE 
FALSE 
FALSE 
FALSE 
FALSE 
FALSE 
FALSE 
FALSE 
FALSE 
FALSE 
FALSE 
FALSE 
FALSE 
FALSE 
FALSE 
FALSE 
FALSE 
FALSE 
FALSE 
FALSE 
FALSE 
FALSE 
FALSE 
FALSE 


time_taker compromised 


The models with 4 different layers were tested upon the training data, and their performance 


and accuracy was measured. The screencast of the code can be seen below. 


https://youtu.be/b-i VnoBMfyo 
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